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ABSTRACT 

A relational document retrieval system is developed 
that is primarily concerned with hierarchic tree structures and 
lateral links. The lateral linkages are relationships that cannot be 
represented within the tree structure representation. A number of 
annotated examples are given along with a detailed description of the 
relational data structure and the programs that manipulate and search 
this structure. The system was programmed in a string manipulating 
language designed for information retrieval purposes. (Author) 
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THE IMPLEMENTATION OF A RELATIONAL DOCUMENT RETRIEVAL SYSTEM 



Yean-Hsi Chang, M.S. 

Department of Electrical Engineering 
University of Illinois, 1970 

A relational document retrieval system is developed that is primarily 
concerned with hierarchic tree structures and lateral links. The lateral 
linkages are relationships that cannot be represented within the tree 
structure representation. A number of annotated examples are given along 
with a detailed description of the relational data structure and the progra 
that manipulate and search this structure. The system was programmed in 
ISL, a string manipulating language designed for information retrieval 
purposes . 
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I . INTRODUCTION 

Document retrieval systems have been intensively studied in recent 
years. One of the more interesting aspects in this area is that of file 
structuring. One particular file structure may be found to be very effective 
in a particular system unacceptable in another system. A review of some of 
the well known document retrieval systems shows thac there are many problematic 
aspects still remaining with respect to the file structure. The AESOP 
system [7] uses hierarchic tree structure to handle the category titles and 
tables to handle the items in the categories. The BOLD system [8] uses list 
subject category and matraces to show the relation between documents and 
index terms. The SMART system [9] uses an inverted file of clusters. But 
no one has considered relations between documents, such as". two documents 
could have the same descriptors and yet be discussing entirely different 
sui jects, or conversely if two documents are concerned with the same topic but 
have entirely different descriptors; also the relationship between certain docu- 
ments could be conditional upon the intent of the user. These relationships 
can not be expressed clearly by a simple list or a simple hierarchic tree 
structure. In view of this fact, Esser [l] has .done seme theoretical work on 
hierarchic tree structuring together with lateral links (LL) relating the terms 
used in the field of ceding theory. The work reported here represents the 
development and implementation of the computational aspects of a relational 
data file for document retrieval employing the tree structuring a lateral- 

link concepts. 

In order to give an overall picture of a system that employs these 
techniques described previously and shows the effectiveness of such an approach 
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a number of annotated examples will be given in Chapter IX. Following these 
examples a detailed description of the relational data structure developed 
is given in Chapter III. In Chapter IV a formal description of all the 

features available in system is given. 

The Appendix contains a description of the programs, a user's 
manual and a sample tree structure from the relational model used. 

This document retrieval system was programmed in ISL, a string 
manipulating language designed for information retrieval [2, 3, 4, 5, 6], 
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II. THE USE OF RELATIONAL INFORMATION IN DOCUMENT RETRIEVAL 

<ji]Ag relational document retrieval system implemented in this paper 
is one which uses ail the available relations between items of the data base 
to give the user as much information as possible. It is also designed in such 
a way that the user needs no technical background in order to do the retrieval. 
The examples given in this chapter show the system's actual performance. The 
detail step-by-step description of how to operate the system is given in 
Appendix B . 

The data base area used for this implementation is from the ai.ea of 
coding theory. The names are either the categories or the terms generally 
used in coding theory texts, e.g., cyclic codes, optimum codes, decoding 
algorithms, binary codes, etc. Each node in the tree structure represents 
a topic area- 

If the user has in mind a topic in coding theory and he vjishes to 
know more about it, then this program can be used. Suppose that a user 
has "OPTIMUM CODES" in mind. After the computer has been set up by following 
the procedure given in Appendix B, the statement "STATE YOUR REQUEST will 
appear on the display console as in Figure 1 and will also be typed on 
the console typewriter. Now, he can type "OPTIMUM CODES” followed by a 
carriage return. At this point in the program "ABORT" may be typed instead 
and the program control will be passed to the monitor. 

The topic that has been typed will be shown in the display console. 
The program can be restarted again by typing "ERASE” or the program will 
continue by typing "GO". After typing "GO", the display console will appear 
as in Figure 2. Pressing the right-most arrow on the screen with the lightpen. 
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F igure 1 „ 
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the display console will appear as shown in Figure 3, There are 15 commands, 
the functions of which are explained in detail in Appendix B= If "DWNTREE” 
is selected with the light-pen, the display console will be as shown in 

F igure 4 * 

Suppose that the user wants to know the optimum codes under high 
rate transmission; he can press the right-most arrow with the light- pen to get 
Figure 3 shown on the display console again and select 'ADD with the light 
pen. By typing "HIGH RATE" followed by a carriage return Figure 5 will be 
shown if "HIGH RATE" is available in the data file (in this case, it is). 

By pressing the right-most arrow and selecting the "PTR2DWN" with the 
light-pen four times until the picture as shown in Figure 6 is shown on the 
display-console screen. 

Pressing the right-most arrow and se lect ing "I A.TLINK" with the light- 
pen will then present the user the information shown in Figure 7, 

If he presses the right-most arrow with the light-pen and selects "GO- 
ON", he will see the picture shown as in Figure 8. That means there is only 

one node whose name is "OPTIMUM CODES". 

The user may start the retrieval process* if he feels that the topic 
names found are enough for his interests simply by pressing the right-most 

arrow and selecting "RETRIEVE" with the light-pen. 

If he selects "LISTALL", he will have the whole list shown again. By 
selecting "PRINT", he will have the whole list printed. But, if he selects 
"restart", he will erase the whole list and start the program all over again. 



*Program for retrieval process is not included in this implementation. 
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III. THE REIATIONAL STRUCTURE DESIGN 



In order to best illustrate how the relational information is stored, 
a representation of the subtree given in Figure 10 will be given. Three 
structure representations are descri ^ ’-lere. 

A. Node Representation 

Each model node consists of tsso jpsxts shown in Figure 9. In 
part one, there are seven fields. EFic-iUii. 1, "NEXT'', contains the word 
address of the next node whose natne sJ-^inrts wifcxh the same character 
as this node; field 2, "END", contain- a: polmter pointing to the byte 
address of the end of the character string or this node; field 3, 
"type", contains the information whach indicates the type of lateral 
link function that applies to this node; field 4, "l#" , contains the 
item number of this node; field 5, "UPPER" contains the item number 
of the node which is at one level up in the hierarchic tree structure 
field 6 and 7, "LOWER 1" and "LOWER 2", contain the item numbers of 
the first and last nodes, respectively, which are at one level down 
in the hierarchic tree structure. Part two of the node is a variable 
length string that contains the name of the topic represented by this 
node • 

Figure 11 shows how this information would look for the node 
given in Figure 10. 

B. Lateral Link Representation 

These structures consist of three to eight fields depending on 
which type of la teral link func.tn.Gtia applies. Field 1 contains the it 



10 





4 Port Two ^ 


n or T vjnc 




NEXT 


END 


TYPE 


I# 


UPPER 


LOWER 1 1 LOWER 2 






FP-2361 



F igure 9 , 




PERFECT CODE QUASI -PERFECT MEET UPPER BOUND SATISFY OPTIMUM 

CODE PROPERTIES 



FP-2360 



Figure lOo Fart o£ the tree structure about the node whose name is ’’TYPES OF 
OPTIMUM CODES”. 



O 

ERIC 



15 





Figure 11, Relations between nodes, 

"> related by pointer 
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number of the major node, and the rest of the fields cointain item 
numbers of those nodes which required by the lateral link function^ 
For example, for the node "OPTIMUM CODES", the lateral IJ.ink Represen- 
tation will be as shown in Figure 12, since the lateral link -function 
T.T.T^Q requires 7 item numbers. 



Representation of Synonyms 

These structures have the same form as the regular node rtepresen- 
tation. The differences are; 

1) Field 3 in part one indicates the synonym property. 

2) Field 5 in part one contains the item number of its 

synonym node . 

3) Field 6 and 7 in part one contain zeros, and 

4) Part two contains the name of the item which is not 

in the tree structure but is a synonym. 

For example, "IdAX . CODES" is not in the tree structure, but its synonym 

"MAX. DISTANCE SEPARABLE CODES" is. So its representation is as shown 

in Figure 13. 
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Figure 12 . Lateral link representation for node "OPTIMUM CODES".. 




Figure 13. Synonym representation for "MAX. CODES". 
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IV. mJOR FUNCTIONS NECESSARY FOR THE USE OF RELATIONAL STRUCTURES 



The retrieval action of this program is initiated whenever the name 
3 f a topic or the leading portion of it is given. There are four major 
Eunctions in this implementation. 

A. The nods is located by its name or the leading portion which is 
given. Nodes which have the same name or the leading portion, or 
which are related to the node found either by the hierarchy tree 
structure or by the lateral link functions will also be suggested, 
printed, and shown on the display console. 

B. In any displayed list of nodes two movable pointers are provided 
which can move up and down independently for use in performing manipu 
lative actions . 

C. A lateral link function will be performed upon command on the 
nodes pointed to by the pointer 1 (PTR 1) if one record is required 
by that lateral link function, or on the two records pointed to by 
the pointers (PTR 1 and PTR 2) if two records are required by a 
lateral function. Lateral functions which require more than two 
•records will be performed on records in the whole list. 

D. . Nodes which are not in the tree structure but have synonyms 

will be located. Its synonym will be printed and displayed as 
described in A. 
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V . CONCLUSION 

The work presented in this paper is concerned with the hierarchy tree 
structure and lateral link functions in relational document retrieval 
systems. Fast access and processing have been carefully considered. The 
flexibility of this implementation is such that it is possible to apply 
these principles equally well to any other relational document retrieval 
data base which possesses hierarchy tree structures and lateral link 
function relations without any modification of the program. 
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DESCRIPTIONS OF PROGRAMS 
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Appendix A contains the following information in the order listed below: 

(1) MAIN PROGRAM 

(2) subprograms 

(3) FLOW CHARTS 
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1. Main Program 

This implementation has been programmed on the Control Data 1604 com- 
puter at the Coordinated Science Laboratory of the University of Illinois. 
The main program and the subprograms comprising this implementation were 
written in the Information Search Language (ISL) L 253,4,5,63- 



There are three portions of the main program in the implementation which 
will be briefly described here. 

a. The first portion of the main program serves to set up the 
files in the computer memory in the appropriate sequential 
form. For easy and fast access, directories are built in 
which the leading addresses of groups and the addresses 
with respect to item numbers are kept . T>,/enty-two lateral 
function directories are also built, each for one of the 
twenty-two categories of lateral link functions. 

b. The second portion of the main program accepts commands 
from the user and makes a list of the required topic and 
synonyms and then suggests topics which may be of further 
interest to the pser . 

c. The remaining portion of the main program performs further 
manipulations of the information and executes the desired 
la ter a 1 link f unc t ions upon the user ’ s demand . 
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SUBROUTINES 



There are 13 subprograms: 

a. SETUP: This subprogram arranges the records in the appropriate 

sequence and sets up the directories and the links. 

b. SEARCHA: This subprogram analyzes a given character string and 

locates where in the relational f i le it is located . 

c. SEARCHB : This subprogram locates a given character string in 

the portion of the relational file after a given address. 

d. COMPARE: This subprogram compares two character strings of 

a given length to see if they are the same. 

e. LFTN: This subprogram serves to perform the 22 lateral 

functions, to suggest new related nodes or to suggest the 
deleted of the node found. 

f. SETLL: This subprogram puts all the item numbers which will 

be used in the particular lateral function into the corres- 
ponding directory. 

g. MOVPTR: This subprogram moves the pointer specified from 

position 1 (posl) to position 2 (pos2) . 

h. CHKXNO: This subprogram checks for the item number specified 

to see if it is in the list DIRECT. The value 0 in accumulator 
A indicates success. Fail otherwise. 

i. GETINO: This subprogram gets the item number of the topic in 

the record with leading address given and stores it at ITNO , 
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j. MOVTVl: This subprogram puts the statement with given leading 

address and ending address in the scope buffer and adjusts 
the list DIRECT. 

k. M0VTV2: When the leading address of the record is given, this 

subprogram moves the topic title of that record to tbs scope 
buffer and sets both lists DIRECT and DIRTTV . 

l. GETRTl: This subprogram moves the stafeEment betweem the two 

given addresses to the scope buffer. Totaon , for the record 
with leading address at RCDADR, it fimcSs ell the ^related records 
and puts their topic titles iin the sccc^; Snuffer . 

m. GETRT2: This subprogram gets the i tencimumbers poiEited to by 

both pointer 1 (1 — *) ard pointer 2 and checks to see 

if the condition of the lateral link function involved is 
satisfied. On success., it moves the statement between the 
two given addresses and the related topic names to the scope 
buffer . 



3 . FLOW CHARTS 



In the flow charts, 
a. b : : 

b . : 

C,. ( ) 

d. (al,a2) ; 



the following symbols are used: 

Index designator, 

Deisgnated index register. 

Contents of a register or storage location. 
Contents between byte locations al and a2 . 
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USER’S MANUAL, 
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1. To start the program: 

(a) Mount tape C-140 (ISL MASTER) om logic unit 1, and tape F-L37 
(MATN) on lorgic unit 2 . 

<b) Mount tape E-138 (DA3?AMAIN) on logic unit 8. 

(c) Press "AUTO-LOAD" button to initisste the program. 

(d) Type "LOAD,/" followed by a carriage return. 

<e) Type "GO,MAZU" followed bF a carriinge return, 

2. To start the retrieval process: 

The program wiUL be ready when "STATE TOUR REQUEST" is typedl on the 
typewriter, and at the same time shown on display console. 

Type a topic name or its leading portion which you are interested in,.. 
Follow this by a carriage return. What you have typed will also be shown 
on the display console. 

Another "carriage return" will give you the opportunity to terminate 
the program by typing "ABORT" or change the name by typing "ERASE" ; and then 
a "carriage return". 

Your command will be accepted each time the display console is showing 
information. There are 15 simple commands available; 

(a) PTRIUP : To move pointed 1 (1 — *) up the list one step. 

(b) PTRIDWN: To move the pointed 1 down the list one step. 

(c> PTR2UP: To move pointer 2 (2--*) up the list one step. 

<d) PTR2DWN: To move pointer 2 down the list one step. 

(e) DELETE; To delete the topic title pointed to by pointer 1. 
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(£) ADD: To add a new tti^Xe at the end of the Ihst. 

(g) UP TREE : To fand the title of the node one level higher thaac 
the one pointed to by pointer 1 in the tree sstruc tur e . 

(h) DWNTREE;: To find the titl® of the node one level lower tthaED 

the one pointed to by pointer 1 in the tree structure. 

(i) RESTART: To erase th^ who^e list and start 'he program over 

again . 

(j) GO-ON: To keep the list in the memory and go on the retrnsvauL 

process . 

(k) lATLXNK: To apply th^ lateral function on the topics pointsed' to 

by pointer 1 <and also poih-ter 2 or of the whole list if function 
requires. 

(l) bISTALL: To restore the pnrt of the list kept in the memory 

and show them all on the TV screen. 

(m) PRINT: To print out "Whatever shown in the TV screen. 

(n) THATSALL: To terminate the program. 

(o) RETRIEVAL: (Not included In this program). 
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SAMPLE OF THE HIERARCHIC TREE STRUCTURE USED TO TEST THE PROGRAM 
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TVI - TV0+20B 






"nTC^^PT AV 


STRTTV 


L = Q2 - Q1 




UJ^lri-iA.X 


(TVO,TV2) 
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MAIN PROGRAM CONTINUE 




SELECT WITH LIGHT -PEN 



GOTO 



1 . - PTRIUP 

2 . - PTRIDWN 

3. - PTR2UP 

4. - PTR2DWN 

5 . - DELETE 

6 . - ADD 

7 . - UPTREE 

8 . - DWNTREE 

9 . - RESTART 

10. - GO-ON 

11. - LATLINK 

12 . - LISTALL 

13 . - PRINT 

14. - THATSALL 

15. 



1. PTRIU 

2 . PTRID 

3 . PTR2U 

4. PTR2D 

5 . DELETE 

6. ADD 

7. UPTREE 

8 . DWNTREE 

9 . RESTART 

10 . STOR 

11. LLINK 

12. LISTALL 

13 . PRT 

14. THATSALL 

15. 






- RETRIEVE 



THATSALL 
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MAIN PROGRAM CONTINUE 











MAIN PROGRAM CONTINUE 
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MAIN PROGRAM CONTINUE 
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MA.IN PROGRAM CONTINUE 




O 

ERIC 



37 



33 



main ‘PROGRAM CONTINUE 
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SUBPROGEAM SETUP CONTINUE 
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SUBPROGRAM SETUP CONTINUE 
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SUBPROGRAM LFTN CONTINUE 



38 




ERIC 



43 



SLBPROGRAM LFTN CONTINUE 
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SUBPROGEAM UTN CONTINUE 
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